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1. INTRODUCTION 

Tax is the largest state income to finance government administration, public services, and public 
infrastructure [1]. Because taxes are very important for the state, tax authorities must know all types of tax 
evasion, both by individual taxpayers and corporate taxpayers in Indonesia, quickly, effectively and 
efficiently. The taxation system in Indonesia is a self-assessment system. It means taxpayer calculates, 
deposits and self-reports the amount of tax he/she should pay based on the tax laws and regulations. 

The consequence of self-assessment taxation system in Indonesia is a big data which comes from 
Periodic Tax Return and Annual Tax Return. It requires an effective and efficient way to be managed based 
on tax regulations. Effective means the tax authorities supervise taxpayers based on the compliance 
classification and determine taxpayers priorities based on predetermined variable criteria. Efficient means the 
tax authorities supervise taxpayers so that taxpayers do tax obligations based on the tax regulations without 
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much time and big effort. Furthermore, taxes are forced by obtaining indirect compensations; thus, taxpayers 
tend to avoid tax or try to do tax evasion. It will harm the state in the form of reduced tax revenues. 

Several studies related researches to detect tax avoidance efforts have been conducted. 
The researchers [2], conducted a study developing a system that can measure the dimensions of billboards 
without physically touching them automatically to calculate tax in Indonesia. The researchers [3], conducted 
a study using a hybrid intelligence system to detect tax evasion from corporate taxpayers, namely taxpayers 
who do business beverages and textile in Iran. The researchers [4], conducted a study investigating classic tax 
evasion cases using several methods aimed at classifying tax evasion behavior based on the network that has 
been simulated with real data. The researchers [5], conducted a study by applying parallelism techniques that 
aim to improve the performance of fraud detection algorithms. Another researchers [6], conducted a study to 
detect fraudulent tax invoices using various types of data mining techniques. 

This study proposes taxpayer supervision which variables in this study can be used for all types of 
taxpayers in Indonesia and can be used for all types of tax in Indonesia. Each state has different tax 
regulations and administration, so this study has advantages compared to other studies because this study 
detect tax avoidance in Indonesia. The supervision of taxpayers in this study is to classify taxpayer 
compliance into four classes with the classification algorithms of data mining. The classification algorithms 
chosen are C4.5, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), 
and Multilayer Perceptron (MLP). Every algorithm is compared to determine the best performance 
classification using Fuzzy AHP and TOPSIS. The best performance classification is chosen based on the 
criteria of Accuracy, F-Score and Time required. The priorities of taxpayers compliance supervision are 
formal and material non-compliant taxpayers, formal compliant taxpayers, material compliant taxpayers, 
and formal and material compliant taxpayers. Every taxpayer's compliance has a lot of data. Therefore, it has 
to priority taxpayers to be examined and processed based on tax regulations. In determining the priority of 
taxpayers, researchers propose fuzzy AHP and TOPSIS methods. The results show that alternative taxpayer 
A233 is the top priority taxpayer with a preference value of 0.433; whereas alternative taxpayer AOS1 is the 
lowest priority taxpayer with a preference value of 0.036. 


2. RESEARCH METHOD 
This study proposes effective and efficient taxpayer supervision by classifying taxpayer compliance 
into four classes. In general, the flow of this study is explained in Figure 1. 
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Figure 1. General description of the overall study 
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2.1. Selection Data 

Data mining is the process of finding patterns and trends from large data so that predictions can be 
made about that data [7, 8]. Selection of variables in data mining is the process of identifying some of the 
most important variables or attributes that are important in the model for goals prediction [9]. This study uses 
data sources from the reporting of certain regional taxpayers for the 2015-2017 tax years. Tables from the 
database used are Masterfile of taxpayers, Tax Reporting Data, Tax Payments, Arrears, Form 1771, 1771 J, 
1771 IV, Form 1770, 1770 Appendix I Page 1, and 1770 Appendix ITI. 


2.2. Preprocessing 

Preprocessing is a step doing to make raw data into quality data [10]. The data is cleaned to correct 
bad data, filtering some incorrect data from the dataset and reducing unnecessary data details [10]. 
Data aggregation is the preprocessing process to summarize data [11]. Variables from the data aggregation 
process are Late_ Report, Not_Report, Late_To_Pay, GPM, NPM and CTTOR. Late_Report and Not_Report 
variables are accumulations of late reports and unreported amounts in one tax year derived from tax reporting 
tables. Gross Profit Margin (GPM), Net Profit Margin (NPM), and Corporate Tax Turn Over Ratio (CTTOR) 
variables are financial ratio analysis variables. The Taxable Entrepreneur variable (PKP) is the result of 
transforming the PKP date column in the masterfile of taxpayers table where "Yes" if the PKP date is not null 
and "No" if the PKP date is null. Data Transformation is the process of consolidating data into forms that are 
suitable for data mining purposes [11]. The formal and material compliant taxpayers come from the list of 
compliant taxpayers with the criteria for late reporting within a year no more than three times, not reporting 
in a year no more than three times and late paying in one year no more than three times. 
Goals come from data on taxpayers who make corrections to Tax Return or taxpayers are issued Tax 
Underpayment Assessment Letter (SKPKB). 


2.3. Normality Test and Correlation Test 

This study uses the Kolmogorov-Smirnov normality test to find the distribution of data normal or 
not. The normality test determines the correlation test method to select dataset variables. The Kolmogorov 
Smirnov normality test compares the distribution data with normal distribution standard [12]. The data is not 
normal if the significance is below 0.05 [13]. The Kolmogorov-Smirnov test in this study used the SPSS 
Statistic 23 tool and the results showed that the study data had an abnormal distribution so that the correlation 
test method used the Spearman correlation. 

The correlation test serves to select and to ensure the variables correlate with the goals to be 
achieved. The Spearman correlation test is a nonparametric statistic and it is used in the condition of one or 
both variables measured are ordinal scale or both variables are quantitative, but normal conditions are not 
met [14]. The Spearman correlation test in this study uses the SPSS Statistic 23 tool to determine the strength 
level of the inter-variable relationship guided by the correlation coefficient value. The correlation coefficient 
value is 0.00-0.25 = little or no relationship, correlation coefficient value is 0.25-0.50 = fair degree of 
relationship, correlation coefficient value is 0.50-0.75 = moderate to good relationship, and correlation 
coefficient value is >0.75 = very good to excellent relationship [14]. 


2.4. Classification Process 

The classification process is the process of analyzing data by extracting a model that describes 
important data classes [11]. The process of classification data is divided into two steps, which consist of 
learning steps (where the classification model is built) and the classification step (where the model is used to 
predict goals based on the data given) [11]. The dataset has been created, using the C4.5, SVM, KNN, NB, 
and MLP classification algorithms to see the best performance classification. 

C4.5 is a classification algorithm of data mining processes by forming a decision tree [15]. The first 
step this algorithm calculates information gain of all the attributes and then selects the root node from 
attribute having the highest information gain (1). The gain value can be calculated by finding the entropy 
value first (2). Entropy is diversity. 


Gain(y,A) = Entropy (y) — Lee vate (a), Entropy (ye) () 
where value (A) is all possible values of attribute A, and y, is a subset of y where A has a value of c 


Entropy (y) = Liza — Di 10g2 Di (2) 


where 1, P2,--,Pn each express the proportions of class 1, class 2, ..., class n in output y. 
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The second step makes a branch based on the possible attribute values, then divide the case into 
branches, and this process is repeated for each new branch so that the case in the branch has the same class. 
Support Vector Machine is a supervised learning model for conducting a linear and non-linear classification 
of data [16, 17]. The SVM method finds the optimal hyperplane that separates feature points from two 
different classes with the largest margin possible in the feature space. Classification cases are linearly 
separable, the separating function sought is a linear function using (3). 


g(x) = sign (f(@)) (3) 


with f(x) = w?x + b, where x, w, b € R. This classification problem can be formulated as follows: we find 
the set of parameters (w,b) so that f(x;) =< w,x; >+b=y; for all i. This technique finds the best 
separator function between functions that are not limited to separating two objects. If in two dimensions the 
separator is a line, in three dimensions the separator is a plane, and in high dimensions (more than three) the 
separator is a hyperplane. The best hyperplane is a hyperplane located in the middle between two sets of 
objects from two classes. Finding the best hyperplane is equivalent to maximizing the margin, which is the 
distance between the hyperplane and the support vectors. 

K-Nearest Neighbor is a supervised algorithm learning based on the k-nearest neighbor by 
classifying new instances based on the majority of the k-nearest neighbor categories [18]. This method 
calculates similarities between samples of unlabeled data and all training data samples [19]. KNN is 
determined by looking at the shortest distance from the query instance to the training sample data. The 
Euclidean distance formula is often used to determine the distance between two training and testing objects. 

Naive Bayes is a classification method based on the application of the Bayes theorem using 
knowledge about probability and statistics [20]. The Naive Bayes algorithm predicts future opportunities 
based on previous experience so that it is known as the Bayes Theorem (4). The main characteristic of Naive 
Bayes Classifier is a very strong (naive) assumption of independence from each condition/event. 


P (XIC) P(c) 


P(clx) = “A 


(4) 


where x is data with an unknown class, c is the data hypothesis is a specific class, P(clx) is probability of 
hypothesis based on condition (posterior probability), P(c) is probability of hypothesis (prior probability), 
P(xlc) is probability based on conditions on the hypothesis, and P(x): Probability c. 

Multilayer Perceptron adopts the workings of neural networks in living things, namely having an 
architecture on how to convert two or more inputs into output [21]. The inspiration of ANN comes from the 
observation that the system of learning from living things, especially humans, from very complex networks 
consisting of interconnected neurons. ANN appears as an alternative to conventional approaches which are 
usually less flexible about structural changes in problems. ANN proposes the advantages of being able to get 
over several problems without making drastic changes to the model. 


2.5. Classification Algorithms Performance 

Confusion matrix, Percentage Split Test, and k-fold Cross-validation have a function to determine 
the performance of the C4.5, SVM, KNN, NB, and MLP classification algorithms in classifying dataset based 
on goals of this study. 

The confusion matrix is the process of evaluating the performance of a system by determining the 
amount of data that is classified correctly and incorrectly [22, 23]. The table confusion matrix shows the 
relationship between observed and estimated values for evaluating data classification results [24]. Accuracy 
is a comparison between data classified correctly with all data tested (5). Precision describes the number of 
positive category data that are classified correctly divided by the total data classified as positive (6). 
Recall shows what percentage of positive category data is correctly classified by the system (7). F-score is a 
combination of Precision and Recall to measure the main quality and can be used as an alternative (7). 


= (P+ TN) 
Accuracy = (TP + TN + FP + FN) » 
Pa TP 
Precision = (TP + FP) 7 
TP 
Recall ~ (TP + FN) " 
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2xPrecision x Recall 
F-Score = ————E (8) 


(Precision + Recall) 


where TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative. 

Percentage Split Test divides the test method into two, training sets and test sets based on the 
desired percentage (Figure 2) [25]. The K-fold Cross-validation method divides the dataset consisting of 
training data as many as k-1 and testing data as many as 1. This study uses the value of k 10 (Figure 2) [26]. 
This study, the results of value of accuracy, F-Score, and Time required (Time) from Percentage Split Test 
method and K-fold Cross-validation method are searched the average value of each classification algorithm 
where is the average results as a basis for calculating the Alternative Data Normalization Matrix in TOPSIS 
method for determining algorithm with the best performance classification. 





Percentage Split Test k-fold Cross-validation 
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Figure 2. Algorithms testing methods 
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2.6. Selection Best Algorithm 

Accuracy, F-Score, and Time required are the criteria for determining the best performance 
classification algorithm. The criteria were given weights with Fuzzy AHP and were ranked using TOPSIS. 
Fuzzy AHP functions to determine the weight of the specified criteria and TOPSIS to rank the selected 
alternatives [27]. This study uses Fuzzy AHP and TOPSIS because this method can provide good results and 
this method is suitable for solving complex problems that are not too subjective [28]. Figure 3 is the 
hierarchy to determine the best performance classification algorithm in this study. 
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Figure 3. Hierarchy of purpose selection best algorithm 


Fuzzy AHP method combines Analytic Hierarchy Process (AHP) and Fuzzy functions to minimize 
the obscurity that is formed in the fuzzy ratios [29]. Fuzzy AHP combines the AHP ranking with a fuzzy 
concept approach [30]. Fuzzy AHP is used to improve the deficiency of the AHP method, namely to solve 
problems that are not too subjective. AHP method is used to help solve multi-criteria problems from several 
alternative decisions [31-33]. AHP method solves the problem by breaking the problem into several parts so 
that it forms a hierarchy of three parts of objectives, criteria and alternatives [34, 35]. To get better results, 
AHP is combined with other techniques such as Fuzzy logic [36]. 

The Fuzzy set approach in AHP has the purpose of solving the problem of obscurities of human 
thought that was first used by Zadeh [37, 38]. Fuzzy numbers make it possible to solve the problems where 
criteria are not precisely defined [39]. To solution this, a number of special Triangular Fuzzy Numbers (TFN) 
were formed into AHP values which are divided into three parts, namely 1 = lowest value, m = middle value 
and u = highest value, using (9) [40, 41]: 
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X) SAUX Gg 9 

Hr) = 4 if x € [mul a 


0, otherwise 


The first steps of Fuzzy AHP create the hierarchical structure of the problem that must be solved 
and determine the comparison of the paired matrices between the criteria and the scale of the TFN. The next 
steps determine the value of fuzzy Synthesis (Si) using (10). 


2 2 -1 
S,= TfL M/g®@ [Lhi Lj M/ a1 (10) 
where: ),;=,M jg; is the sum of each TFN value in a paired matrix (11) 


WE Mg OG Dem ey) (11) 





while pean vr Mi gil is the inverse of the operation of the number of TFN (12) 


[rE l Mg] = [tet (12) 


n n n 
DiniUi Vj M Lina 


After the operational fuzzy continues, then the search process for the possible level of operational 
fuzzy continues. The probability level is assumed to be M_2>M_1 where M_1=(1_1,m_l,u_l)andM 2= 
(1_2, m_2, u_2) for more details see (13). 





V(.M.2.= M,).= SUP,s,.[-min.(u.M,(x)), (uM2.(y))| (13) 
So that obtaining the possibilities (14): 


1 if. m,.2.m, 
if L.=h 
V.(M,.=>M,)= ly-Ug (14) 


(m2.-uz).-(m4-1y) 


The next step is to calculate the defuzzification of the ordinate (d ') and the weight of the vector (V). 
Before calculating vector weight values, the first thing to do is calculate the value of defuzzification 
ordination. To calculate the ordinate defuzzification can use (15). 


d"(A;) = minV(S; = S;,) (15) 


The next step after obtaining the defuzzification ordinate is to calculate the vector weight value 
with (16) 


W® = (d"(A,), d" (Ay), d"(Ag), - 0. d"(A,)) (16) 


where Ai = 1,2, ... n is a fuzzy vector. 
After the vector weight value is known, the next step is to normalize the value of fuzzy vector 
weight (W) using (17). 


w = (d(A,),d(Ap), d(Ag), +10.) d(An))- (17) 


The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) is a technical method 
for order preference with similarity to an ideal solution algorithm [36]. TOPSIS was first introduced by 
C.LHwang and K.Yoon in 1981 [42, 43]. In the case of the Multi-Criteria Decision Making (MCDM) many 
ranking methods can be used, one of which is a TOPSIS method [44]. MCDM was developed to provide 
solutions in the decision making the process [45]. There are five steps in the TOPSIS method [46, 47], 
namely: 
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Step 1: Calculate the normalized decision matrix (18). 


pe 83 ADHD (18) 


m 2 
1 %ab 


Step 2: Calculate the weighted normalized decision matrix (19). 


Vip =Wy Ny A = 1,2,3,...m,b =1,2,3,...n (19) 


Wy» is the weight of the b th criterion. 
Step 3: Calculate the positive ideal solutions At and negative ideal solutions A~ (20). 


PS AS Woe} (20) 


A” ={v,.V5---V, } 





Step 4: Using the Euclidean n-dimensional distance method do the calculation of the size of the separation of 


each alternative from the ideal solution (D;) and separation of each alternative from the negative-ideal 


solution (D, ) (21). 


D; = {00a —ve) (21) 
b=1 


Dz = [> — vs» 
Step 5: Determine the preference value with (22). 


Cae (22) 
(2 +D,) 


Step 6 : Ranking the alternatives from largest to lowest preference value. 


2.7. Taxpayer Supervision Priorities 

After finding the algorithm with the best performance classification, the taxpayers are chosen as the 
priority of supervision. Based on the level of taxpayer compliance, the priority of taxpayers supervision is 
taxpayers who are at the level of compliance with the order of formal and material non-compliant taxpayers, 
formal compliant taxpayers, material compliant taxpayers, and formal and material compliant taxpayers. 

This study uses the results of taxpayer data from the best algorithm classification with a formal and 
material non-compliant taxpayers goal as a role model because the priority taxpayers supervision has the 
same method for other levels of compliance. Twenty-five dataset variables are criteria for determining 
taxpayers supervision priorities. The results of taxpayer data from the best classification algorithm are the 
alternative to rank based on these Twenty-five criteria. Fuzzy AHP is used to give weight criteria and 
TOPSIS is used to rank alternatives. The flow of taxpayers supervision priorities determine the priority of 
taxpayers be supervised in this study as shown in Figure 4. 








Taxpayer Supervision Priorities 
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Figure 4. Hierarchy of purpose taxpayers supervision priorities 
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RESULTS AND DISCUSSION 
The results of the preprocessing process, normality test, and correlation test in this study produce 


dataset variables as shown in Table 1. 


Table 1. The Variables of The Research Dataset 








No _ Variable Description Type 
1 PKP_Status Yes = Taxpayers are taxable entrepreneurs, No = Non-taxable entrepreneurs Nominal 
2 Type_of_taxpayer Badan = Corporate taxpayer, OP = Individual taxpayer Nominal 
3 Sector The sector code of Taxpayer business Nominal 
4 Late_Report Late Amount of Report in one tax year Numeric 
5 Not_Report Amount Not Reported in one tax year Numeric 
6 Late_to_pay Late Pay Amount in one tax year Numeric 
7 ~~ Arrears Arrears Value of SKPKB for annual tax returns that have not been paid off Numeric 
8 Turnover Turnover Numeric 
9 VAT Value Added Tax (VAT) with MAP Code = 411211 Numeric 
10. Fiscal_Net_Income Value of Fiscal Net Income Numeric 
11 PKP Value of Taxable Income Numeric 
12 Compensation Value of Compensation for Losses Numeric 
13. Income_Tax_payable Value of Income Tax payable Numeric 
14 PPh29_payment Value of Income Tax Article 29 payment with MAP Code = 411125 or 411126 with payment Numeric 
type code = 200 
15 Installment_Psl25 The instalment amount of Article 25 Income Tax per month Numeric 
16 ~~ Psl25_ Payment Value of Income Tax Article 29 payment with MAP Code = 411125 or 411126 with payment Numeric 
type code = 100 
17 HPP Cost of goods sold Numeric 
18  DPP_Peng_TB Basic Imposition of Taxes Transfer of Rights to Land and/or Buildings Numeric 
19 42_TB_Payment Payment of Final PPh Article 4 paragraph (2) for the transfer of rights to land and/or Numeric 
buildings with MAP Code = 411128 with payment type code = 402 
20 DPP_Construction Tax Base for construction business Numeric 
21  DPP_PP46 Basic Imposition of Certain Income Taxes (PP 46) Numeric 
22  PP46_Payment Payment of PP 46 with MAP Code = 411128 and payment type code = 420 Numeric 
23 GPM Gross Profit Margin Numeric 
24 NPM Net Profit Margin Numeric 
25 CTTOR Corporate Tax Turn Over Ratio Numeric 
26 Tax_Compliance Goals of this study Nominal 








This study uses the Weka version 3.8.1 tool to test dataset. The dataset in this study has 2424 data. 


The C4.5, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), and Multilayer 
Perceptron (MLP) classification algorithms were tested and analyzed with a confusion matrix to show the 
performance classification of the algorithms. The test models used are 10-folds Cross-validation (M1) and 
Split Mode Percentage of 60% (M2). Table 2 shows a confusion matrix resulting from the 10-folds Cross- 
validation (M1) testing method and Percentage. 


Table 2. Confusion Matrix on M1 and on M2 








C4.5 SVM KNN NB MLP 
MI M2 Mi M2 Mi M2 Mi M2 Mi M2 
Correctly Classified 2371 950 2126 861 2081 841 1871 749 2141 875 
Incorrectly Classified 53 23 298 112 343 132, 553.224 283 98 





Split Mode 60% (M2) method from each classification algorithm. Comparison of the classification 
results with the average value (A) of M1 and M2 is shown in Table 3. 


Table 3. Comparison on M1 and on M2 








C4.5 SVM KNN NB MLP 
M1 M2 A MI M2 A MI M2 A M1 M2 A M1 M2 A 
Precision 0.98 0.98 0.98 0.88 0.89 089 0.86 0.87 0.86 0.74 0.72 0.73 0.88 0.90 0.89 
Recall 0.98 0.98 0.98 0.88 0.89 0.88 0.86 0.86 0.86 0.77 0.77 0.77 0.88 0.90 0.89 
F-Score 0.98 0.98 0.98 0.88 0.89 0.88 0.86 0.86 0.86 0.75 0.74 0.75 0.88 0.90 0.89 
Accuracy 97.81 97.64 97.73 87.71 88.49 88.10 85.85 86.43 86.14 77.19 76.98 77.09 88.33 89.93 89.13 
Time (seconds) 0.11 0.13 0.12 3.34 3.04 3.19 0.00 0.00 0.00 0.05 0.02 0.04 72.67 69.37 71.02 
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There are two phases, that are Fuzzy AHP and TOPSIS. Fuzzy AHP is used to obtain fuzzy vector 
weights based on the criteria of Accuracy (C1), F-Score (C2), and Time required (C3) whereas TOPSIS is 
used to determine the rank of all alternatives based on the largest preference value. The alternatives to 
determine the best algorithm are C4.5 (Al), SVM (A2), KNN (A3), NB (A4) and MLP (A5). The initial 
phases make a comparison of AHP criteria matrix from each criterion and make a comparison matrix of 
fuzzy AHP criteria and calculate the value of fuzzy Synthesis (Si) using (10) as Table 4. 


Table 4. Comparison Matrix of Each Criterion and Value of Si 











Cc Cl C2 C3 Synthesis Value 
L M U L M U L M U L M U 
Cl 1.0 1.0 1.0 0.5 1.0 1.5 1.0 1.5 2.0 0.20 0.38 0.63 
C2 0.7 1.0 2.0 1.0 1.0 1.0 1.0 1.5 2.0 0.21 0.38 0.70 
C3 0.5 0.7 1.0 0.5 0.7 1.0 1.0 1.0 10 0.16 0.25 0.42 





The next steps calculate the degree of possibility by using (13) and calculate the ordinate 
defuzzification value (d ') using (15). The results are shown in Table 5. The final steps of Fuzzy AHP 
calculate Weight Vector values using (16) and normalize the value of the fuzzy Weight Vector using (17). 
The results are shown in Table 6. 


Table 5. Degree of Possibility and Defuzzification Ordinate Table 6. Value of The Weight Vector 

















Criteria Degree of possibility | Summary Of Degree d' Criteria W' WwW 
Cl Cl >=C2 1 1 Cl 1 0.381 
Cl >=C3 1.365 C2 1 0.381 
fon) C2>=Cl 1 1 C3 0.622 0.237 
C2 >= C3 1.303 
C3 >=Cl 0.636 
= C3 >= C2 0.622 ated 


After obtaining the weight of each criterion as in Table 6, the second phases rank alternative data to 
determine the best algorithm using the TOPSIS method. The first steps of the TOPSIS method create 
Alternative Data Normalization Matrix using (18) and create the Alternative Data Normalization Matrix 
using (19) based on the weight vector as shown in Table 6. The results of the Alternative Data Normalization 
Matrix and The Weighted Alternative Data Normalization Matrix are shown in Table 7. 


Table 7. Alternative Data Normalization Matrix and The Weighted Alternative Data Normalization Matrix 
Normalization Matrix | Weighted Normalization Matrix 





Alternative 








Cl C2 C3 Cl C2 C3 
Al 0.497 0.501 0.002 0.19 0.191 0 
A2 0.448 045 0.045 0.171 0.172 0.011 
A3 0.438 0.439 0 0.167 0.168 0 
A4 0.392 0.383 0.001 0.15 0.146 0 
AS 0.454 0.455 0.999 0.173 0.173 0.237 





The next steps calculate the value of Positive Ideal Solutions Matrix and Negative Ideal Solutions 
Matrix using (20). The results of Positive Ideal Solutions Matrix and Negative Ideal Solutions Matrix are 
shown in Table 8. The value of Positive Ideal Solutions Matrix and Negative Ideal Solutions Matrix as shown 
in Table 8 are used to calculate Alternative Distance values using (21) and to calculate the preference values 
of each alternative using (22) as shown in Table 9. The final results of TOPSIS method rank preference 
values based on the largest preference value as shown in Table 10. 


Table 8. Positive Ideal Solutions Matrix and 
Negative Ideal Solutions Matrix 








Alternative Cl C2 C3 
A+ 0.19 0.191 0 
A- 0.15 0.146 0.237 





Table 9. Alternative Distance and Value of Preference 








Alternative Distance+(D+) Distance-(D-) Preference 
Al 0.0000 0.2449 0.998 
A2 0.0000 0.2280 0.887 
A3 0.0447 0.2366 0.880 
A4 0.0632 0.2366 0.797 
AS 0.2366 0.0447 0.131 
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Table 10. Ranking Alternative 








Ranking Alternative Preference 
1 Al 0.998 
2; A2 0.887 
3 A3 0.880 
4 A4 0.797 
5 AS 0.131 





Table 10 shows that the C4.5 (A1) is the best performance classification because it has the largest 
preference. After finding the best algorithm, the next steps determine the priority of the taxpayer be 
supervised. The priority of taxpayers in this study is divided into two phases, namely Fuzzy AHP to obtain 
fuzzy vector weights and TOPSIS to rank all alternatives based on the largest preference value. The criterias 
for obtaining fuzzy vector weights are PKP_Status (C1), Type_of_taxpayer (C2), Sector (C3), Late_Report 
(C4), Not_Report (C5), Late_to_pay (C6), Arrears (C7), Turnover (C8), VAT (C9), Fiscal_Net_Income 
(C10), PKP (C11), Compensation (C12), Income_Tax_payable (C13), PPh29_payment (C14), 
Installment_Psl25 (C15), Psl25_ Payment (C16), HPP (C17), DPP_Peng_TB (C18), 42_TB_ Payment (C19), 
DPP_Construction (C20), DPP_PP46 (C21), PP46_Payment (C22), GPM (C23), NPM (C24), CTTOR (C25). 
This study ranks the priority of taxpayer's from class/goal formal and material non-compliant taxpayers data. 
It is the results of the best algorithm. The formal and material non-compliant taxpayers are 338 data. The first 
steps determining the priority of taxpayers to be supervised make a comparison matrix of AHP criteria as 
shown in Table 11. 

The next steps make a comparison matrix of fuzzy AHP criteria with TFN scale and calculate fuzzy 
Synthesis (Si) using (11) based on Table 11. The values of fuzzy Synthesis are shown in Table 12. The next 
steps calculate the degree of possibility using (13) and calculate the ordinate defuzzification values (d ') using 
(15). The next steps calculate the vector weight values using (16) and normalize the values of the fuzzy 
Weight Vector using (17) as shown in Table 13. 


Table 11. Comparison Matrix of Each Criterion 

Cor C02 C03 C04 CO5 C06 CO7 C08 CO9 C10 Cll Ci2 Cl3 Cl4 C15 Cl6 Cl7 C18 C19 C20 C21 C22 C23 C24 C25 
cor 10 50 30 30 30 40 01 03 02 03 03 03 03 02 30 02 03 30 02 30 03 02 02 02 03 
CO2 0.2 10 03 30 3.0 30 01 03 02 30 03 05 03 02 5.0 02 03 20 02 20 03 02 02 02 02 
CO3 03 30 10 40 40 40 0.1 03 02 40 03 03 03 02 3.0 02 03 03 02 03 03 02 02 02 03 
Co4 03 03 03 10 05 10 0.1 03 03 03 03 05 03 03 03 03 03 05 03 05 05 03 03 03 03 
COS 03 03 03 20 10 20 0.1 03 03 03 03 05 03 03 03 03 03 05 03 05 05 03 03 03 03 
C06 03 03 03 10 05 10 0.1 03 03 03 03 05 03 03 03 03 03 05 03 05 05 03 03 03 03 
CO7 7.0 7.0 70 70 7.0 7.0 10 5.0 30 40 50 50 50 30 40 3.0 40 40 30 50 5.0 30 3.0 3.0 20 
CO8 40 40 40 30 3.0 30 02 10 03 30 30 50 40 03 3.0 03 30 40 03 30 30 03 03 03 03 
coo 5.0 50 50 4.0 40 40 03 3.0 10 30 50 5.0 5.0 10 50 10 3.0 30 10 30 3.0 10 30 30 20 
C10 40 03 03 30 30 30 03 03 03 10 10 30 20 03 30 03 30 30 03 30 30 03 03 03 03 
Cll 40 40 40 30 30 30 02 03 02 10 10 30 20 03 30 03 03 30 03 30 30 03 03 03 03 
C12 30 20 30 20 20 20 02 02 02 03 03 10 03 02 30 03 03 03 03 03 03 03 03 03 03 
C13 40 40 40 30 30 30 02 03 02 05 05 30 10 03 05 03 03 05 03 05 03 03 03 03 03 
Cl4 50 50 50 30 30 30 03 3.0 10 30 30 50 30 10 30 10 30 30 10 30 30 10 3.0 3.0 20 
C15 03 02 03 30 30 30 03 03 02 03 03 03 20 03 10 03 03 20 03 30 30 03 03 03 03 
C16 5.0 50 50 30 30 30 03 4.0 10 30 30 30 30 10 30 10 30 30 10 30 30 10 3.0 3.0 20 
C17 40 40 40 30 30 30 03 03 03 03 30 40 30 03 30 03 10 30 03 30 30 03 03 03 03 
C18 03 05 30 20 20 20 03 03 03 03 03 30 20 03 05 03 03 10 03 10 10 03 03 03 03 
C19 5.0 50 50 30 30 30 03 3.0 10 40 40 40 30 10 3.0 10 30 30 10 30 30 10 3.0 3.0 20 
C20 03 05 30 20 20 20 02 03 03 03 03 30 20 03 03 03 03 10 03 10 20 03 03 03 03 
C21 3.0 30 30 20 20 20 02 03 03 03 03 40 30 03 03 03 03 10 03 05 10 03 03 03 03 
C22 5.0 50 50 30 30 30 03 3.0 10 30 30 40 30 10 30 10 30 30 10 30 30 10 3.0 3.0 20 
C23 5.0 5.0 50 30 30 40 03 3.0 03 30 30 30 30 03 3.0 03 30 30 03 30 30 03 10 05 03 
C24 5.0 50 50 40 40 40 03 3.0 03 30 30 30 30 03 3.0 03 30 30 03 30 30 03 20 10 05 
C25 40 5.0 40 30 3.0 3.0 05 3.0 05 3.0 30 30 30 05 3.0 05 30 30 05 30 30 05 3.0 20 10 








Table 12. Value of Si 





C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 Cll C12 C13 Cl4 C15 Cl6 C17 C18 C19 C20 C21 C22 C23 C24 C25 





0.02 0.02 0.02 0.01 0.01 0.01 0.05 0.03 0.03 0.02 0.02 0.01 0.02 0.03 0.02 0.03 0.02 0.02 0.03 0.02 0.02 0.03 0.03 0.03 0.03 
0.03 0.03 0.03 0.03 0.03 0.03 0.08 0.05 0.06 0.04 0.04 0.03 0.04 0.05 0.03 0.05 0.04 0.03 0.05 0.03 0.03 0.05 0.05 0.05 0.05 
0.05 0.05 0.05 0.05 0.05 0.05 0.13 0.08 0.10 0.07 0.07 0.05 0.07 0.09 0.06 0.09 0.08 0.06 0.09 0.06 0.06 0.09 0.09 0.09 0.10 


azc 





Table 13. Value of The Weight Vector 
C01 C02 C03 C04 C05 C06 CO7 CO8 C09 C10 C11 C12 C13 Cl4 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 
W' 0.13 0.07 0.09 0.12 0.12 0.10 1.00 0.51 0.72 032 036 0.07 0.36 0.62 0.17 0.61 0.44 0.18 0.64 0.17 0.25 061 056 0.59 0.63 
W__0.01 0.01 0.01 0.01 0.01 0.01 0.11 0.05 0.08 0.03 0.04 0.01 0.04 0.07 0.02 0.07 0.05 0.02 0.07 0.02 0.03 0.07 0.06 0.06 0.07 
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After obtaining the weights of each criterion as shown in Table 13, the second phases rank 
alternative data using the TOPSIS method. The initial steps of the TOPSIS method create the alternative data 
normalization matrix using (18) and create the Weighted Alternative Data Normalization Matrix using (19). 
The results of the Weighted Alternative Data Normalization Matrix are used to calculate the values of 
Positive Ideal Solutions Matrix and Negative Ideal Solutions Matrix using (20) as shown in Table 14. 


Table 14. Positive Ideal Solutions Matrix and Negative Ideal Solutions Matrix 





C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 





A+ 0.00 0.00 0.00 0.00 0.01 0.00 0.08 0.02 0.05 0.02 0.02 0.01 0.02 0.05 0.01 0.03 0.01 0.01 0.06 0.01 0.02 0.04 0.01 0.02 0.03 
A- 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.01 0.00 





The values of Positive Ideal Solutions Matrix and Negative Ideal Solutions Matrix as shown in 
Table 14 are used to calculate Alternative Distance values using (21). After that, the preference values of 
each alternative are calculated using (22). The final results determine the priority of taxpayers be supervised 
as shown in Table 15 which rank alternatives based on the largest reference values. A taxpayer with 
Alternative A233 is priority number one taxpayer to be supervised while the taxpayer with alternative A051 
is the last priority taxpayer. 


Table 15. Ranking Alternative 








Ranking Alternative Preference 
1 A233 0.433 
2 A163 0.362 
3 A240 0.330 
4 A316 0.326 
> A151 0.304 
337 A035 0.044 
338 AOS! 0.036 





4. CONCLUSION 

Table 10 shows that the C4.5 (A1) algorithm is the best performance classification to classify 
taxpayer compliance based on this study dataset with a preference value of 0.998 whereas the MLP (A5) 
algorithm is the lowest classification algorithm with a preference value of 0.131. Table 15 shows that fuzzy 
AHP and TOPSIS methods rank taxpayer priorities for supervision at each level of compliance from the best 
algorithm classification results. Alternative taxpayer A233 is the top priority taxpayer with preference value 
0.433 whereas alternative taxpayer AO51 is the lowest priority taxpayer with preference value 0.036. 

Further studies can be developed by expanding data sources for data mining selection. 
Tax avoidance can be known and detected as early as possible if the criteria used are complex and current 
with tax avoidance behavior. 
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